Click Here!
home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Oracle Performance Tuning and Optimization
(Publisher: Macmillan Computer Publishing)
Author(s): Edward Whalen
ISBN: 067230886x
Publication Date: 04/01/96

Bookmark It

Search this book:
 
Previous Table of Contents Next


Another disadvantage of clusters is a reduction in the performance of INSERT statements. This happens because of the additional complexity of the use of space and because there are multiple tables in the same block. The clustered table also spans more blocks than the individual table would, causing more data to be scanned.

Review of Clusters

A cluster may or may not be useful, depending on how the data is primarily accessed. To decide whether you can benefit from a cluster, consider these guidelines:

  Cluster tables where data is primarily accessed together in a join. The reduced I/O required to bring the additional data into the SGA and the fact that the data is already cached can be a big advantage.
  Do not cluster tables that see a large number of INSERT statements.
  Do not cluster tables if the data in those tables is not frequently accessed together.
  Do not cluster tables if full-table scans are often performed on only one of the tables in the cluster. The additional space required by the cluster and additional I/O reduce performance.

By following these guidelines, you should be able to determine whether a cluster is right for your installation. Be careful: a cluster can hurt performance if improperly used.

Hash Clusters

A hash cluster is similar to a cluster but uses a hash function rather than an index to reference the cluster key. A hash cluster stores the data based on the result of a hash function (a numeric function that determines the data block in the cluster based on the value of the cluster key).

To find the data block in an index cluster, there must first be one or more I/Os to the cluster index to find the correct data block. In a hash cluster, the cluster key itself tells Oracle where the data block is. This arrangement can reduce to one the number of I/Os required to retrieve the row.

In contrast to the index cluster, which stores related data together based on the row’s cluster key value, the hash cluster stores related rows together based on their hash values.

The number of hash values is determined by the value of the HASHKEYS parameter of the CREATE CLUSTER command. The number and size of the cluster keys is very important and should be carefully calculated.

When To Use Hash Clusters

The decision to use hash clusters is an important one. Hash clusters can be beneficial because, when they are effectively used, the requested data can be retrieved in just one I/O. Unfortunately, if a hash cluster is used on a table that is not a good candidate for hashing, performance can be severely degraded.

Although hash clusters can be used in a similar fashion to index clusters, you do not have to cluster the tables to use a hash cluster. In fact, in many cases, it is useful to create a single table as a hash cluster. By using hashing, you can retrieve your data with a single I/O rather than the multiple I/Os required to retrieve the same data using a B* -Tree index.

Because hashing uses the value of the data to calculate the data block the desired data is in, hashing is best used on tables that have unique values for the cluster key and that are queried primarily by equality queries on the cluster key. In the case of equality queries, the data is usually retrieved in one read operation. The cluster key need not be a single column; if the typical query uses an equality on a set of columns, use these columns to create a composite key.

An good candidate for hashing has the following properties:

  Unique cluster keys. Hashing performs best when the value of the cluster key is fairly unique. Because the data is laid out based on the key value, a column with a lot of duplicates is not a good candidate for the hash key.
  Equality queries. The majority of queries are equality queries on the cluster key. This type of query reaps the greatest benefit from the hash cluster.
  Static size. Hashing is optimal when the table or tables are fairly static in size. If the table stays within its initial storage allocation, you do not see any performance degradation from using a hash cluster; but if the table grows out of its initial allocation, performance can degrade. In this case, overflow blocks are required.
  Constant cluster key. Hashing can also degrade performance when the value of the cluster key changes. Because the location of the block in which the data resides is based on the cluster key value, a change in that value can cause the row to migrate to maintain the cluster.
  No table scans. Hashing can degrade the performance of table scans because the scan must read blocks that may not have much data in them. Because the table is originally created by laying out the data in the cluster based on the value of the cluster key, there may be some blocks that have few rows.

Do not use a hash cluster on a table if the application frequently modifies the cluster key or the table is constantly being modified. Because the cluster key is based on a calculation, you can incur significant overhead by constantly recalculating the key.

Any time you have a somewhat static table with a unique column value or set of column values, consider creating a hash cluster.

Review of Hash Clusters

As with index clusters, hash clusters have both advantages and disadvantages. Hash clusters are very efficient in retrieving data based on equality queries on the hash key. If you do not retrieve data based on that key, the query is not hashed. As with the index cluster, you see a performance decrease when executing INSERT statements on a hash cluster.

With both index clusters and hash clusters, carefully consider the access patterns to the tables before deciding whether a cluster can help performance. A wrong decision can end up costing you in terms of performance.

If you can take advantage of hashing by meeting somewhat strict criteria, you will see very good performance. Hashing is extremely efficient under the right conditions.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited.